Biostat 212a Homework 1

Due Jan 23, 2024 @ 11:59PM

Author

Brilla Meng UID: 806329681

Published

January 23, 2024

1 Filling gaps in lecture notes (10pts)

Consider the regression model \[ Y = f(X) + \epsilon, \] where \(\operatorname{E}(\epsilon) = 0\).

1.1 Optimal regression function

Show that the choice \[ f_{\text{opt}}(X) = \operatorname{E}(Y | X) \] minimizes the mean squared prediction error \[ \operatorname{E}\{[Y - f(X)]^2\}, \] where the expectations averages over variations in both \(X\) and \(Y\). (Hint: condition on \(X\).)

answer: \[ \operatorname{E}\{[Y - f(X)]^2|X=x\} \\ =\operatorname{E}[Y^2|X=x] - 2f(X)\operatorname{E}[Y|X=x]+f(x)^2 \\ \] taking derivative with respect to f(x) \[ -2\operatorname{E}[Y|X=x]+2f(x)=0 \\ 2f(x)=2\operatorname{E}[Y|X=x] \\ f(x)=\operatorname{E}[Y|X=x] \] Therefore, we found E[Y|X=x] is the minimizes the MSE.

1.2 Bias-variance trade-off

Given an estimate \(\hat f\) of \(f\), show that the test error at a \(x_0\) can be decomposed as \[ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} = \underbrace{\operatorname{Var}(\hat f(x_0)) + [\operatorname{Bias}(\hat f(x_0))]^2}_{\text{MSE of } \hat f(x_0) \text{ for estimating } f(x_0)} + \underbrace{\operatorname{Var}(\epsilon)}_{\text{irreducible}}, \] where the expectation averages over the variability in \(y_0\) and \(\hat f\).

answer: \[ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} \\ =\operatorname{E}\{[y_0 - f(x_0) + f(x_0) - \hat f(x_0)]^2\} \\ =\operatorname{E}\{[y_0 - f(x_0)]^2\} + \operatorname{E}\{[f(x_0) - \hat f(x_0)]^2\} + 2\operatorname{E}\{[y_0 - f(x_0)][f(x_0) - \hat f(x_0)]\} \\ \] Since E[\(\epsilon\)] = 0 and Y0 = f(x0) + \(\epsilon\) ,we have \[ \operatorname{E}\{[y_0 - f(x_0)]^2\} \\ =\operatorname{E}\{[f(x_0) + \epsilon - f(x_0)]^2\} \\ =\operatorname{E}\{[\epsilon]^2\} - (\operatorname{E}[\epsilon])^2 +(E[\epsilon])^2 \\ =\operatorname{Var}(\epsilon) \\ \] Since Y0-f(x0) = \(\epsilon\), E[\(\epsilon\)] = 0 \[ 2(f(x_0) - \hat f(x_0))\operatorname{E}\{[y_0 - f(x_0)]\} \\ =2(f(x_0) - \hat f(x_0))\operatorname{E}\{[\epsilon]\} \\ =0\\ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} \\ =\operatorname{E}[(f(x_0) - \hat f(x_0))^2] + \operatorname{E}(\hat f(x_0))-\hat f(x_0)^2 \\ =\operatorname{E}[f(x_0)-E[\hat f(x_0)]]^2 + E[\hat f(x_0)]-\hat f(x_0)]]^2 \\ =Bias^2[\hat f(x_0)] + Var[\hat f(x_0)] \] Combine all of them, we can got: \[ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} = Var(\hat f(x_0)) + Bias^2(\hat f(x_0)) + Var(\epsilon) \]

2 ISL Exercise 2.4.3 (10pts)

library(tidyverse)
flexibility <- seq(1, 10, length.out = 100)
bias_squared <- (10 - flexibility)^2 / 100
variance <- flexibility^2 / 100
training_error <- bias_squared - (flexibility / 50) + 0.2
irreducible_error <- rep(0.2, length(flexibility))
test_error <- bias_squared + variance + irreducible_error
data <- data.frame(flexibility, bias_squared, variance, 
                   training_error, test_error, irreducible_error)
library(reshape2)
data_melted <- melt(data, id.vars = 'flexibility')
ggplot(data_melted, aes(x = flexibility, y = value, color = variable)) +
  geom_line() +
  labs(x = 'Flexibility', y = 'Error', title = 'Bias-Variance Decomposition') +
  theme_minimal() +
  scale_color_discrete(name = "Curves", labels = c("Squared Bias", "Variance", "Training Error", 
                                                   "Test Error", "Bayes (Irreducible) Error"))
  1. Test error decreases initially but rises as flexibility leads to overfitting.Test error decreases initially but rises as flexibility leads to overfitting. Bias decreases, improving problem representation, while variance increases, especially at higher flexibility, reducing model robustness. This highlights the challenge in balancing model complexity for optimal performance.

3 ISL Exercise 2.4.4 (10pts)

  1. Medical Diagnosis: Response: Diagnosis (e.g., disease present or not). Predictors: Patient symptoms, lab test results, demographic data (age, gender), medical history. Goal: The goal is primarily prediction. The emphasis is on accurately predicting whether a patient has a specific disease or condition based on their symptoms and test results. While inference can be valuable for understanding which factors are most predictive of certain diseases, the immediate utility is in the accurate and efficient prediction of the disease for treatment decisions.
  2. Credit Scoring in Finance: Response: Creditworthiness (e.g., high or low credit risk). Predictors: Credit history, current debts, income, employment status, past loan repayment history, credit score. Goal: This application leans towards prediction. Financial institutions use these models to predict the likelihood that an individual will repay a loan. Understanding the factors that influence credit risk is important, but the primary objective is to predict an individual’s credit risk to make lending decisions.
  3. Customer Churn Prediction in Business: Response: Churn (e.g., whether a customer will stop using a company’s products/services). Predictors: Customer interaction history, purchase history, customer service records, demographic data, usage patterns. Goal: Again, the goal is prediction. Companies use these models to predict which customers are at risk of leaving so that they can take proactive measures to retain them. While inference might help understand why customers churn, the direct aim is to predict churn to implement retention strategies.
  1. Real Estate Pricing: Response: House price. Predictors: Size (square footage), location, number of bedrooms, age of the house, proximity to amenities, etc. Goal: This application is primarily for prediction. The focus is on predicting the price of a house based on various features, which is crucial for buyers, sellers, and real estate agents.
  2. Weather Forecasting: Response: Temperature. Predictors: Humidity, atmospheric pressure, wind speed, historical temperature data, time of the year, etc. Goal: The goal is prediction. Accurate temperature forecasts based on current and historical weather data are vital for a range of activities, from agriculture to daily planning.
  3. Educational Outcomes: Response: Student academic performance (e.g., grades or test scores). Predictors: Study hours, attendance, parental education level, socioeconomic status, previous academic records, etc. Goal: This can be for both inference and prediction. While predicting student performance is valuable, understanding the impact of various factors (like study hours or socioeconomic status) on academic outcomes is also crucial for educational policy and interventions.
  1. Market Segmentation: Objective: To categorize customers into different segments based on their purchasing behavior, preferences, demographic characteristics, etc. Application: Companies can use cluster analysis to identify distinct groups within their customer base. This helps in tailoring marketing strategies, developing targeted products, and improving customer service by understanding the specific needs and preferences of each segment.
  2. Genomic Data Classification in Biology: Objective: To classify genetic data for identifying patterns and similarities in DNA sequences. Application: In biological research, cluster analysis is employed to group genes with similar expression patterns, which can be indicative of shared functions or regulatory mechanisms. This is crucial in understanding genetic diseases, evolutionary biology, and the development of targeted treatments.
  3. Document Classification: Objective: To organize and categorize large sets of digital documents based on their content and thematic similarities. Application: This is particularly useful in digital libraries, online research databases, and for information retrieval systems. By clustering documents, these systems can enhance search accuracy, improve the organization of information, and enable users to discover related content more effectively.

4 ISL Exercise 2.4.10 (30pts)

Your can read in the boston data set directly from url https://raw.githubusercontent.com/ucla-biostat-212a/2024winter/master/slides/data/Boston.csv. A documentation of the boston data set is here.

  1. 4.0.1 R

library(tidyverse)

Boston <- read_csv("https://raw.githubusercontent.com/ucla-biostat-212a/2024winter/master/slides/data/Boston.csv", 
                   col_select = -1) %>% 
  print(width = Inf)
# A tibble: 506 × 13
      crim    zn indus  chas   nox    rm   age   dis   rad   tax ptratio lstat
     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
 1 0.00632  18    2.31     0 0.538  6.58  65.2  4.09     1   296    15.3  4.98
 2 0.0273    0    7.07     0 0.469  6.42  78.9  4.97     2   242    17.8  9.14
 3 0.0273    0    7.07     0 0.469  7.18  61.1  4.97     2   242    17.8  4.03
 4 0.0324    0    2.18     0 0.458  7.00  45.8  6.06     3   222    18.7  2.94
 5 0.0690    0    2.18     0 0.458  7.15  54.2  6.06     3   222    18.7  5.33
 6 0.0298    0    2.18     0 0.458  6.43  58.7  6.06     3   222    18.7  5.21
 7 0.0883   12.5  7.87     0 0.524  6.01  66.6  5.56     5   311    15.2 12.4 
 8 0.145    12.5  7.87     0 0.524  6.17  96.1  5.95     5   311    15.2 19.2 
 9 0.211    12.5  7.87     0 0.524  5.63 100    6.08     5   311    15.2 29.9 
10 0.170    12.5  7.87     0 0.524  6.00  85.9  6.59     5   311    15.2 17.1 
    medv
   <dbl>
 1  24  
 2  21.6
 3  34.7
 4  33.4
 5  36.2
 6  28.7
 7  22.9
 8  27.1
 9  16.5
10  18.9
# ℹ 496 more rows

answer: There is 506 rows and 13 columns in the data set. Each row represent the set of predictor obeservations for a given Neighborhood in Boston. Each column represent each predictor variable for which an observation was made in 506 neighborhoods of Boston.

str(Boston)
tibble [506 × 13] (S3: tbl_df/tbl/data.frame)
 $ crim   : num [1:506] 0.00632 0.02731 0.02729 0.03237 0.06905 ...
 $ zn     : num [1:506] 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
 $ indus  : num [1:506] 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
 $ chas   : num [1:506] 0 0 0 0 0 0 0 0 0 0 ...
 $ nox    : num [1:506] 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
 $ rm     : num [1:506] 6.58 6.42 7.18 7 7.15 ...
 $ age    : num [1:506] 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
 $ dis    : num [1:506] 4.09 4.97 4.97 6.06 6.06 ...
 $ rad    : num [1:506] 1 2 2 3 3 3 5 5 5 5 ...
 $ tax    : num [1:506] 296 242 242 222 222 222 311 311 311 311 ...
 $ ptratio: num [1:506] 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
 $ lstat  : num [1:506] 4.98 9.14 4.03 2.94 5.33 ...
 $ medv   : num [1:506] 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
 - attr(*, "spec")=
  .. cols(
  ..   ...1 = col_skip(),
  ..   crim = col_double(),
  ..   zn = col_double(),
  ..   indus = col_double(),
  ..   chas = col_double(),
  ..   nox = col_double(),
  ..   rm = col_double(),
  ..   age = col_double(),
  ..   dis = col_double(),
  ..   rad = col_double(),
  ..   tax = col_double(),
  ..   ptratio = col_double(),
  ..   lstat = col_double(),
  ..   medv = col_double()
  .. )
Boston$chas <- as.numeric(Boston$chas)
Boston$rad <- as.numeric(Boston$rad)
pairs(Boston)

answer: Not much can be discerned other than the fact that some variables appear to be correlated.

cor(Boston)
               crim          zn       indus         chas         nox
crim     1.00000000 -0.20046922  0.40658341 -0.055891582  0.42097171
zn      -0.20046922  1.00000000 -0.53382819 -0.042696719 -0.51660371
indus    0.40658341 -0.53382819  1.00000000  0.062938027  0.76365145
chas    -0.05589158 -0.04269672  0.06293803  1.000000000  0.09120281
nox      0.42097171 -0.51660371  0.76365145  0.091202807  1.00000000
rm      -0.21924670  0.31199059 -0.39167585  0.091251225 -0.30218819
age      0.35273425 -0.56953734  0.64477851  0.086517774  0.73147010
dis     -0.37967009  0.66440822 -0.70802699 -0.099175780 -0.76923011
rad      0.62550515 -0.31194783  0.59512927 -0.007368241  0.61144056
tax      0.58276431 -0.31456332  0.72076018 -0.035586518  0.66802320
ptratio  0.28994558 -0.39167855  0.38324756 -0.121515174  0.18893268
lstat    0.45562148 -0.41299457  0.60379972 -0.053929298  0.59087892
medv    -0.38830461  0.36044534 -0.48372516  0.175260177 -0.42732077
                 rm         age         dis          rad         tax    ptratio
crim    -0.21924670  0.35273425 -0.37967009  0.625505145  0.58276431  0.2899456
zn       0.31199059 -0.56953734  0.66440822 -0.311947826 -0.31456332 -0.3916785
indus   -0.39167585  0.64477851 -0.70802699  0.595129275  0.72076018  0.3832476
chas     0.09125123  0.08651777 -0.09917578 -0.007368241 -0.03558652 -0.1215152
nox     -0.30218819  0.73147010 -0.76923011  0.611440563  0.66802320  0.1889327
rm       1.00000000 -0.24026493  0.20524621 -0.209846668 -0.29204783 -0.3555015
age     -0.24026493  1.00000000 -0.74788054  0.456022452  0.50645559  0.2615150
dis      0.20524621 -0.74788054  1.00000000 -0.494587930 -0.53443158 -0.2324705
rad     -0.20984667  0.45602245 -0.49458793  1.000000000  0.91022819  0.4647412
tax     -0.29204783  0.50645559 -0.53443158  0.910228189  1.00000000  0.4608530
ptratio -0.35550149  0.26151501 -0.23247054  0.464741179  0.46085304  1.0000000
lstat   -0.61380827  0.60233853 -0.49699583  0.488676335  0.54399341  0.3740443
medv     0.69535995 -0.37695457  0.24992873 -0.381626231 -0.46853593 -0.5077867
             lstat       medv
crim     0.4556215 -0.3883046
zn      -0.4129946  0.3604453
indus    0.6037997 -0.4837252
chas    -0.0539293  0.1752602
nox      0.5908789 -0.4273208
rm      -0.6138083  0.6953599
age      0.6023385 -0.3769546
dis     -0.4969958  0.2499287
rad      0.4886763 -0.3816262
tax      0.5439934 -0.4685359
ptratio  0.3740443 -0.5077867
lstat    1.0000000 -0.7376627
medv    -0.7376627  1.0000000

answer: The variables that are most correlated with medv are lstat and rm. The variables that are most correlated with lstat are rm and ptratio. The variables that are most correlated with rm are lstat and ptratio.

summary(Boston$crim)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.00632  0.08204  0.25651  3.61352  3.67708 88.97620 
summary(Boston$tax)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  187.0   279.0   330.0   408.2   666.0   711.0 
summary(Boston$ptratio)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  12.60   17.40   19.05   18.46   20.20   22.00 
qplot(Boston$crim, binwidth=5 , xlab = "Crime rate", ylab="Number of Suburbs" )
Warning: `qplot()` was deprecated in ggplot2 3.4.0.

qplot(Boston$tax, binwidth=50 , xlab = "Full-value property-tax rate per $10,000", ylab="Number of Suburbs")

qplot(Boston$ptratio, binwidth=5, xlab ="Pupil-teacher ratio by town", ylab="Number of Suburbs")

selection <- subset( Boston, crim > 10)
nrow(selection)/ nrow(Boston)
[1] 0.1067194
selection <- subset( Boston, crim > 60)
nrow(selection)/ nrow(Boston)
[1] 0.005928854
selection <- subset( Boston, tax > 600)
nrow(selection)/ nrow(Boston)
[1] 0.270751
selection <- subset( Boston, tax < 600)
nrow(selection)/ nrow(Boston)
[1] 0.729249
selection <- subset( Boston, ptratio > 17.5)
nrow(selection)/ nrow(Boston)
[1] 0.715415
selection <- subset( Boston, ptratio < 17.5)
nrow(selection)/ nrow(Boston)
[1] 0.284585

answer: For the crime rate, the median is 0.25%, and the maximum is 88.976%, there are some neighborhoods with very high crime rates. And 11% of the neighborhoods have crime rates above 10%, and 0.6% of the neighborhoods have crime rates above 60%. Based on the histogram, there are few place where the tax rate are very high. The median is 330, and the mean is 408.2,the maximum is 711. 27% of the neighborhoods have tax rates above 600. 73% of the neighborhoods have tax rates below 600. Based on the histogram of pupil-teacher ratio, the median is 19.05, and the mean is 18.5, the maximum is 22. . 72% of the neighborhoods have pupil-teacher ratio above 17.5, and 28% of the neighborhoods have pupil-teacher ratio below 17.5.

nrow(subset(Boston, chas ==1)) 
[1] 35

answer: There are 35 census tracts that bound the Charles River.

summary(Boston$ptratio)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  12.60   17.40   19.05   18.46   20.20   22.00 

answer: The median pupil-teacher ratio among the towns in this data set is 19.05.

selection <- Boston[order(Boston$medv),]
selection[1,]
# A tibble: 1 × 13
   crim    zn indus  chas   nox    rm   age   dis   rad   tax ptratio lstat
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
1  38.4     0  18.1     0 0.693  5.45   100  1.49    24   666    20.2  30.6
# ℹ 1 more variable: medv <dbl>
summary(selection)
      crim                zn             indus            chas        
 Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
 1st Qu.: 0.08205   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
 Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
 Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
 3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
 Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
      nox               rm             age              dis        
 Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
 1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
 Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
 Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
 3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
 Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
      rad              tax           ptratio          lstat      
 Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   : 1.73  
 1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.: 6.95  
 Median : 5.000   Median :330.0   Median :19.05   Median :11.36  
 Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :12.65  
 3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:16.95  
 Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :37.97  
      medv      
 Min.   : 5.00  
 1st Qu.:17.02  
 Median :21.20  
 Mean   :22.53  
 3rd Qu.:25.00  
 Max.   :50.00  

answer: The lowest median value of owner- occupied homes is 5. This census tract has several distinctive characteristics compare to overall range. The crime rate is very high at 38.3518, far exceeding the median range, this area has an usually high crime rate.The tract has no residential land zoned for large lots , is entirely industrial , and has an older housing stock , indicating it is a fully developed, non-residential area. The nox is very high because of the industrial pollution.The average number of rooms is lower at 5.453, and the distance to employment centers is quite close at 1.4896. The accessibility to radial highways is the highest at 24, and the property-tax rate is very high at 666. The pupil-teacher ratio is also on the higher end at 20.2, and the proportion of African Americans is near the maximum at 396.9. Lastly, the lower status of the population is very high at 30.59, and the median value of owner-occupied homes is very low at 5. These factors collectively suggest an urban, high-crime, industrial area with economic challenges.

rm_over_7 <- subset(Boston, rm>7)
nrow(rm_over_7) 
[1] 64
rm_over_8 <- subset(Boston, rm>8)
nrow(rm_over_8) 
[1] 13
summary(rm_over_8)
      crim               zn            indus             chas       
 Min.   :0.02009   Min.   : 0.00   Min.   : 2.680   Min.   :0.0000  
 1st Qu.:0.33147   1st Qu.: 0.00   1st Qu.: 3.970   1st Qu.:0.0000  
 Median :0.52014   Median : 0.00   Median : 6.200   Median :0.0000  
 Mean   :0.71879   Mean   :13.62   Mean   : 7.078   Mean   :0.1538  
 3rd Qu.:0.57834   3rd Qu.:20.00   3rd Qu.: 6.200   3rd Qu.:0.0000  
 Max.   :3.47428   Max.   :95.00   Max.   :19.580   Max.   :1.0000  
      nox               rm             age             dis       
 Min.   :0.4161   Min.   :8.034   Min.   : 8.40   Min.   :1.801  
 1st Qu.:0.5040   1st Qu.:8.247   1st Qu.:70.40   1st Qu.:2.288  
 Median :0.5070   Median :8.297   Median :78.30   Median :2.894  
 Mean   :0.5392   Mean   :8.349   Mean   :71.54   Mean   :3.430  
 3rd Qu.:0.6050   3rd Qu.:8.398   3rd Qu.:86.50   3rd Qu.:3.652  
 Max.   :0.7180   Max.   :8.780   Max.   :93.90   Max.   :8.907  
      rad              tax           ptratio          lstat           medv     
 Min.   : 2.000   Min.   :224.0   Min.   :13.00   Min.   :2.47   Min.   :21.9  
 1st Qu.: 5.000   1st Qu.:264.0   1st Qu.:14.70   1st Qu.:3.32   1st Qu.:41.7  
 Median : 7.000   Median :307.0   Median :17.40   Median :4.14   Median :48.3  
 Mean   : 7.462   Mean   :325.1   Mean   :16.36   Mean   :4.31   Mean   :44.2  
 3rd Qu.: 8.000   3rd Qu.:307.0   3rd Qu.:17.40   3rd Qu.:5.12   3rd Qu.:50.0  
 Max.   :24.000   Max.   :666.0   Max.   :20.20   Max.   :7.44   Max.   :50.0  

answer: There are 64 neighborhoods with more than 7 rooms, and 13 neighborhoods with more than 8 rooms. The median value of homes in neighborhoods with more than 8 rooms is 45,000 dollars higher than the median value of homes in neighborhoods with more than 7 rooms. The mean value of homes in neighborhoods with more than 8 rooms is 50,000 dollars higher than the mean value of homes in neighborhoods with more than 7 rooms.

5 ISL Exercise 3.7.3 (12pts)

  1. Salary=50+20*(GPA)+0.07⋅(IQ)+35⋅(Level)+0.01⋅(GPA×IQ)−10⋅(GPA×Level)

i.Salary = 50 + 20 * GPA + 0.07 * IQ + 35 * College + 0.01 * GPA * IQ - 10 * GPA * College We now can estimate that High school earn an average of 50 + 20 * mean(GPA) + 0.07 * mean(IQ) + 0.01 * mean(GPA) * mean(IQ) and College earn an average of 50 + 20 * mean(GPA) + 0.07 * mean(IQ) + 35 + 0.01 * mean(GPA) * mean(IQ) - 10 * mean(GPA). When you subtract out the common terms, you find out that College earn an average of 35 - 10 * mean(GPA) more than High School. Since we don’t know the value of mean(GPA), we don’t know whether High school are outearning College on average or not.

ii.That’s also uncertain.

iii.Since College earn an average of 35 - 10 * mean(GPA) more than High School, a higher GPA means College earn less than High school. So this is true statement.

iv.Since College earn an average of 35 - 10 * mean(GPA) more than High School, a higher GPA means College earn less than High school. So this is false statement.

answer: The answer is iii.

  1. We estimate that College earn an average of 50+20⋅(GPA)+0.07⋅(IQ)+35⋅(1)+0.01⋅(GPA×IQ)−10⋅(GPA×1). When we add IQ as 110 and GPA as 4.0.we will get 50+20⋅4.0+0.07⋅110+35⋅(1)+0.01⋅110×4.0−10⋅(4.0×1) = 137.1 answer: The estimate salary of a college graduate with IQ of 110 and a GPA of 4.0 is $137100.

  2. answer: False, the magnitude of the coefficient is not an indicator of statistical significance.

6 ISL Exercise 3.7.15 (20pts)

  1. Model:Y(crim) = β0 + β1 (zn)X
data = Boston
boston.zn <- lm(crim ~ zn, data=Boston)
 summary(boston.zn)

Call:
lm(formula = crim ~ zn, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-4.429 -4.222 -2.620  1.250 84.523 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.45369    0.41722  10.675  < 2e-16 ***
zn          -0.07393    0.01609  -4.594 5.51e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.435 on 504 degrees of freedom
Multiple R-squared:  0.04019,   Adjusted R-squared:  0.03828 
F-statistic:  21.1 on 1 and 504 DF,  p-value: 5.506e-06
 par(mfrow = c(2, 2))
 plot(boston.zn)

we can see that F-statistic is 21.1 and p-value is < 5.506e-06,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and zn.

Model:Y(crim) = β0 + β1 (indus)X

boston.indus <- lm(crim ~ indus, data=Boston)
 summary(boston.indus)

Call:
lm(formula = crim ~ indus, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.972  -2.698  -0.736   0.712  81.813 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.06374    0.66723  -3.093  0.00209 ** 
indus        0.50978    0.05102   9.991  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.866 on 504 degrees of freedom
Multiple R-squared:  0.1653,    Adjusted R-squared:  0.1637 
F-statistic: 99.82 on 1 and 504 DF,  p-value: < 2.2e-16
 par(mfrow = c(2, 2))
 plot(boston.indus)

we can see that F-statistic is 99.82 and p-value is < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and indus.

Model:Y(crim) = β0 + β1 (chas)X

boston.chas <- lm(crim ~ chas, data=Boston)
 summary(boston.chas)

Call:
lm(formula = crim ~ chas, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-3.738 -3.661 -3.435  0.018 85.232 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.7444     0.3961   9.453   <2e-16 ***
chas         -1.8928     1.5061  -1.257    0.209    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared:  0.003124,  Adjusted R-squared:  0.001146 
F-statistic: 1.579 on 1 and 504 DF,  p-value: 0.2094
 par(mfrow = c(2, 2))
 plot(boston.chas)

we can see that F-statistic is 1.579 and p-value is 0.2094. There is not a statistically significant association between crim and chas.

Model:Y(crim) = β0 + β1 (nox)X

boston.nox <- lm(crim ~ nox, data=Boston)
 summary(boston.nox)

Call:
lm(formula = crim ~ nox, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-12.371  -2.738  -0.974   0.559  81.728 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -13.720      1.699  -8.073 5.08e-15 ***
nox           31.249      2.999  10.419  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.81 on 504 degrees of freedom
Multiple R-squared:  0.1772,    Adjusted R-squared:  0.1756 
F-statistic: 108.6 on 1 and 504 DF,  p-value: < 2.2e-16
 par(mfrow = c(2, 2))
 plot(boston.nox)

we can see that F-statistic is 108.6 and p-value is < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and nox.

Model:Y(crim) = β0 + β1 (rm)X

boston.rm <- lm(crim ~ rm, data=Boston)
 summary(boston.rm)

Call:
lm(formula = crim ~ rm, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-6.604 -3.952 -2.654  0.989 87.197 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   20.482      3.365   6.088 2.27e-09 ***
rm            -2.684      0.532  -5.045 6.35e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.401 on 504 degrees of freedom
Multiple R-squared:  0.04807,   Adjusted R-squared:  0.04618 
F-statistic: 25.45 on 1 and 504 DF,  p-value: 6.347e-07
 par(mfrow = c(2, 2))
 plot(boston.rm)

we can see that F-statistic is 25.45 and p-value is 6.347e-07,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and rm.

Model:Y(crim) = β0 + β1 (age)X

boston.age <- lm(crim ~ age, data=Boston)
 summary(boston.age)

Call:
lm(formula = crim ~ age, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-6.789 -4.257 -1.230  1.527 82.849 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.77791    0.94398  -4.002 7.22e-05 ***
age          0.10779    0.01274   8.463 2.85e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.057 on 504 degrees of freedom
Multiple R-squared:  0.1244,    Adjusted R-squared:  0.1227 
F-statistic: 71.62 on 1 and 504 DF,  p-value: 2.855e-16
 par(mfrow = c(2, 2))
 plot(boston.age)

we can see that F-statistic is 71.62 and p-value is 2.855e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and age.

Model:Y(crim) = β0 + β1 (dis)X

boston.dis <- lm(crim ~ dis, data=Boston)
 summary(boston.dis)

Call:
lm(formula = crim ~ dis, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-6.708 -4.134 -1.527  1.516 81.674 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.4993     0.7304  13.006   <2e-16 ***
dis          -1.5509     0.1683  -9.213   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.965 on 504 degrees of freedom
Multiple R-squared:  0.1441,    Adjusted R-squared:  0.1425 
F-statistic: 84.89 on 1 and 504 DF,  p-value: < 2.2e-16
 par(mfrow = c(2, 2))
 plot(boston.dis)

we can see that F-statistic is 84.89 and p-value is 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and dis.

Model:Y(crim) = β0 + β1 (rad)X

boston.rad <- lm(crim ~ rad, data=Boston)
 summary(boston.rad)

Call:
lm(formula = crim ~ rad, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.164  -1.381  -0.141   0.660  76.433 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.28716    0.44348  -5.157 3.61e-07 ***
rad          0.61791    0.03433  17.998  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.718 on 504 degrees of freedom
Multiple R-squared:  0.3913,    Adjusted R-squared:   0.39 
F-statistic: 323.9 on 1 and 504 DF,  p-value: < 2.2e-16
 par(mfrow = c(2, 2))
 plot(boston.rad)

we can see that F-statistic is 323.9 and p-value < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and red.

Model:Y(crim) = β0 + β1 (tax)X

boston.tax <- lm(crim ~ tax, data=Boston)
 summary(boston.tax)

Call:
lm(formula = crim ~ tax, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-12.513  -2.738  -0.194   1.065  77.696 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -8.528369   0.815809  -10.45   <2e-16 ***
tax          0.029742   0.001847   16.10   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.997 on 504 degrees of freedom
Multiple R-squared:  0.3396,    Adjusted R-squared:  0.3383 
F-statistic: 259.2 on 1 and 504 DF,  p-value: < 2.2e-16
 par(mfrow = c(2, 2))
 plot(boston.tax)

we can see that F-statistic is 259.2 and p-value < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and tax.

Model:Y(crim) = β0 + β1 (ptratio)X

boston.ptratio <- lm(crim ~ ptratio, data=Boston)
 summary(boston.ptratio)

Call:
lm(formula = crim ~ ptratio, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-7.654 -3.985 -1.912  1.825 83.353 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -17.6469     3.1473  -5.607 3.40e-08 ***
ptratio       1.1520     0.1694   6.801 2.94e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.24 on 504 degrees of freedom
Multiple R-squared:  0.08407,   Adjusted R-squared:  0.08225 
F-statistic: 46.26 on 1 and 504 DF,  p-value: 2.943e-11
 par(mfrow = c(2, 2))
 plot(boston.ptratio)

we can see that F-statistic is 46.26 and p-value is 2.943e-11,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and ptratio.

Model:Y(crim) = β0 + β1 (lstat)X

boston.lstat <- lm(crim ~ lstat, data=Boston)
 summary(boston.lstat)

Call:
lm(formula = crim ~ lstat, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-13.925  -2.822  -0.664   1.079  82.862 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.33054    0.69376  -4.801 2.09e-06 ***
lstat        0.54880    0.04776  11.491  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.664 on 504 degrees of freedom
Multiple R-squared:  0.2076,    Adjusted R-squared:  0.206 
F-statistic:   132 on 1 and 504 DF,  p-value: < 2.2e-16
 par(mfrow = c(2, 2))
 plot(boston.lstat)

we can see that F-statistic is 132 and p-value < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and lstat.

Model:Y(crim) = β0 + β1 (medv)X

boston.medv <- lm(crim ~ medv, data=Boston)
 summary(boston.medv)

Call:
lm(formula = crim ~ medv, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-9.071 -4.022 -2.343  1.298 80.957 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 11.79654    0.93419   12.63   <2e-16 ***
medv        -0.36316    0.03839   -9.46   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.934 on 504 degrees of freedom
Multiple R-squared:  0.1508,    Adjusted R-squared:  0.1491 
F-statistic: 89.49 on 1 and 504 DF,  p-value: < 2.2e-16
 par(mfrow = c(2, 2))
 plot(boston.medv)

we can see that F-statistic is 89.49 and p-value < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and medv.

lm.all <- lm(crim ~.,data = Boston)
summary(lm.all)

Call:
lm(formula = crim ~ ., data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-8.534 -2.248 -0.348  1.087 73.923 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.7783938  7.0818258   1.946 0.052271 .  
zn           0.0457100  0.0187903   2.433 0.015344 *  
indus       -0.0583501  0.0836351  -0.698 0.485709    
chas        -0.8253776  1.1833963  -0.697 0.485841    
nox         -9.9575865  5.2898242  -1.882 0.060370 .  
rm           0.6289107  0.6070924   1.036 0.300738    
age         -0.0008483  0.0179482  -0.047 0.962323    
dis         -1.0122467  0.2824676  -3.584 0.000373 ***
rad          0.6124653  0.0875358   6.997 8.59e-12 ***
tax         -0.0037756  0.0051723  -0.730 0.465757    
ptratio     -0.3040728  0.1863598  -1.632 0.103393    
lstat        0.1388006  0.0757213   1.833 0.067398 .  
medv        -0.2200564  0.0598240  -3.678 0.000261 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.46 on 493 degrees of freedom
Multiple R-squared:  0.4493,    Adjusted R-squared:  0.4359 
F-statistic: 33.52 on 12 and 493 DF,  p-value: < 2.2e-16

answer: The predictors that appear to be statistically significant are zn, dis, rad, medv have a significant association with crim (p-value is below 0.05) which means we can reject the null hypothesis.

x = c(coefficients(boston.zn)[2],
      coefficients(boston.indus)[2],
      coefficients(boston.chas)[2],
      coefficients(boston.nox)[2],
      coefficients(boston.rm)[2],
      coefficients(boston.age)[2],
      coefficients(boston.dis)[2],
      coefficients(boston.rad)[2],
      coefficients(boston.tax)[2],
      coefficients(boston.ptratio)[2],
      coefficients(boston.lstat)[2],
      coefficients(boston.medv)[2])
y = coefficients(lm.all)[2:13]
plot(x, y, col = "blue",pch =19, ylab = "multiple regression coefficients",
     xlab = "Univariate Regression coefficients",
     main = "Relationship between Multiple regression \n and univariate regression coefficients")

  1. Model: crim=β0+β1(zn)+β2(zn)2+β3(zn)3+ϵ
boston.poly.zn = lm(crim ~ zn + I(zn^2) + I(zn^3), data = Boston)
summary(boston.poly.zn)

Call:
lm(formula = crim ~ zn + I(zn^2) + I(zn^3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-4.821 -4.614 -1.294  0.473 84.130 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.846e+00  4.330e-01  11.192  < 2e-16 ***
zn          -3.322e-01  1.098e-01  -3.025  0.00261 ** 
I(zn^2)      6.483e-03  3.861e-03   1.679  0.09375 .  
I(zn^3)     -3.776e-05  3.139e-05  -1.203  0.22954    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.372 on 502 degrees of freedom
Multiple R-squared:  0.05824,   Adjusted R-squared:  0.05261 
F-statistic: 10.35 on 3 and 502 DF,  p-value: 1.281e-06

Model: crim=β0+β1(indus)+β2(indus)2+β3(indus)3+ϵ

boston.poly.indus = lm(crim ~ indus + I(indus^2) + I(indus^3), data = Boston)
summary(boston.poly.indus)

Call:
lm(formula = crim ~ indus + I(indus^2) + I(indus^3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-8.278 -2.514  0.054  0.764 79.713 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.6625683  1.5739833   2.327   0.0204 *  
indus       -1.9652129  0.4819901  -4.077 5.30e-05 ***
I(indus^2)   0.2519373  0.0393221   6.407 3.42e-10 ***
I(indus^3)  -0.0069760  0.0009567  -7.292 1.20e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.423 on 502 degrees of freedom
Multiple R-squared:  0.2597,    Adjusted R-squared:  0.2552 
F-statistic: 58.69 on 3 and 502 DF,  p-value: < 2.2e-16

Model: crim=β0+β1(chas)+β2(chas)2+β3(chas)3+ϵ

boston.poly.chas = lm(crim ~  + I(chas^2) + I(chas^3), data = Boston)
summary(boston.poly.chas)

Call:
lm(formula = crim ~ +I(chas^2) + I(chas^3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-3.738 -3.661 -3.435  0.018 85.232 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.7444     0.3961   9.453   <2e-16 ***
I(chas^2)    -1.8928     1.5061  -1.257    0.209    
I(chas^3)         NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared:  0.003124,  Adjusted R-squared:  0.001146 
F-statistic: 1.579 on 1 and 504 DF,  p-value: 0.2094

Model: crim=β0+β1(nox)+β2(nox)2+β3(nox)3+ϵ

boston.poly.nox = lm(crim ~ nox + I(nox^2) + I(nox^3), data = Boston)
summary(boston.poly.nox)

Call:
lm(formula = crim ~ nox + I(nox^2) + I(nox^3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-9.110 -2.068 -0.255  0.739 78.302 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   233.09      33.64   6.928 1.31e-11 ***
nox         -1279.37     170.40  -7.508 2.76e-13 ***
I(nox^2)     2248.54     279.90   8.033 6.81e-15 ***
I(nox^3)    -1245.70     149.28  -8.345 6.96e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.234 on 502 degrees of freedom
Multiple R-squared:  0.297, Adjusted R-squared:  0.2928 
F-statistic: 70.69 on 3 and 502 DF,  p-value: < 2.2e-16

Model: crim=β0+β1(rm)+β2(rm)2+β3(rm)3+ϵ

boston.poly.rm = lm(crim ~ rm + I(rm^2) + I(rm^3), data = Boston)
summary(boston.poly.rm)

Call:
lm(formula = crim ~ rm + I(rm^2) + I(rm^3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.485  -3.468  -2.221  -0.015  87.219 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 112.6246    64.5172   1.746   0.0815 .
rm          -39.1501    31.3115  -1.250   0.2118  
I(rm^2)       4.5509     5.0099   0.908   0.3641  
I(rm^3)      -0.1745     0.2637  -0.662   0.5086  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.33 on 502 degrees of freedom
Multiple R-squared:  0.06779,   Adjusted R-squared:  0.06222 
F-statistic: 12.17 on 3 and 502 DF,  p-value: 1.067e-07

Model: crim=β0+β1(age)+β2(age)2+β3(age)3+ϵ

boston.poly.age = lm(crim ~ age + I(age^2) + I(age^3), data = Boston)
summary(boston.poly.age)

Call:
lm(formula = crim ~ age + I(age^2) + I(age^3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-9.762 -2.673 -0.516  0.019 82.842 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept) -2.549e+00  2.769e+00  -0.920  0.35780   
age          2.737e-01  1.864e-01   1.468  0.14266   
I(age^2)    -7.230e-03  3.637e-03  -1.988  0.04738 * 
I(age^3)     5.745e-05  2.109e-05   2.724  0.00668 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.84 on 502 degrees of freedom
Multiple R-squared:  0.1742,    Adjusted R-squared:  0.1693 
F-statistic: 35.31 on 3 and 502 DF,  p-value: < 2.2e-16

Model: crim=β0+β1(dis)+β2(dis)2+β3(dis)3+ϵ

boston.poly.dis = lm(crim ~ dis + I(dis^2) + I(dis^3), data = Boston)
summary(boston.poly.dis)

Call:
lm(formula = crim ~ dis + I(dis^2) + I(dis^3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.757  -2.588   0.031   1.267  76.378 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  30.0476     2.4459  12.285  < 2e-16 ***
dis         -15.5543     1.7360  -8.960  < 2e-16 ***
I(dis^2)      2.4521     0.3464   7.078 4.94e-12 ***
I(dis^3)     -0.1186     0.0204  -5.814 1.09e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.331 on 502 degrees of freedom
Multiple R-squared:  0.2778,    Adjusted R-squared:  0.2735 
F-statistic: 64.37 on 3 and 502 DF,  p-value: < 2.2e-16

Model: crim=β0+β1(rad)+β2(rad)2+β3(rad)3+ϵ

boston.poly.rad = lm(crim ~ rad + I(rad^2) + I(rad^3), data = Boston)
summary(boston.poly.rad)

Call:
lm(formula = crim ~ rad + I(rad^2) + I(rad^3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.381  -0.412  -0.269   0.179  76.217 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.605545   2.050108  -0.295    0.768
rad          0.512736   1.043597   0.491    0.623
I(rad^2)    -0.075177   0.148543  -0.506    0.613
I(rad^3)     0.003209   0.004564   0.703    0.482

Residual standard error: 6.682 on 502 degrees of freedom
Multiple R-squared:    0.4, Adjusted R-squared:  0.3965 
F-statistic: 111.6 on 3 and 502 DF,  p-value: < 2.2e-16

Model: crim=β0+β1(tax)+β2(tax)2+β3(tax)3+ϵ

boston.poly.tax = lm(crim ~ tax + I(tax^2) + I(tax^3), data = Boston)
summary(boston.poly.tax)

Call:
lm(formula = crim ~ tax + I(tax^2) + I(tax^3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-13.273  -1.389   0.046   0.536  76.950 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.918e+01  1.180e+01   1.626    0.105
tax         -1.533e-01  9.568e-02  -1.602    0.110
I(tax^2)     3.608e-04  2.425e-04   1.488    0.137
I(tax^3)    -2.204e-07  1.889e-07  -1.167    0.244

Residual standard error: 6.854 on 502 degrees of freedom
Multiple R-squared:  0.3689,    Adjusted R-squared:  0.3651 
F-statistic:  97.8 on 3 and 502 DF,  p-value: < 2.2e-16

Model: crim=β0+β1(ptratio)+β2(ptratio)2+β3(ptratio)3+ϵ

boston.poly.ptratio = lm(crim ~ ptratio + I(ptratio^2) + I(ptratio^3), data = Boston)
summary(boston.poly.ptratio)

Call:
lm(formula = crim ~ ptratio + I(ptratio^2) + I(ptratio^3), data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-6.833 -4.146 -1.655  1.408 82.697 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  477.18405  156.79498   3.043  0.00246 **
ptratio      -82.36054   27.64394  -2.979  0.00303 **
I(ptratio^2)   4.63535    1.60832   2.882  0.00412 **
I(ptratio^3)  -0.08476    0.03090  -2.743  0.00630 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.122 on 502 degrees of freedom
Multiple R-squared:  0.1138,    Adjusted R-squared:  0.1085 
F-statistic: 21.48 on 3 and 502 DF,  p-value: 4.171e-13

Model: crim=β0+β1(lstat)+β2(lstat)2+β3(lstat)3+ϵ

boston.poly.lstat = lm(crim ~ lstat + I(lstat^2) + I(lstat^3), data = Boston)
summary(boston.poly.lstat)

Call:
lm(formula = crim ~ lstat + I(lstat^2) + I(lstat^3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.234  -2.151  -0.486   0.066  83.353 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  1.2009656  2.0286452   0.592   0.5541  
lstat       -0.4490656  0.4648911  -0.966   0.3345  
I(lstat^2)   0.0557794  0.0301156   1.852   0.0646 .
I(lstat^3)  -0.0008574  0.0005652  -1.517   0.1299  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.629 on 502 degrees of freedom
Multiple R-squared:  0.2179,    Adjusted R-squared:  0.2133 
F-statistic: 46.63 on 3 and 502 DF,  p-value: < 2.2e-16

Model: crim=β0+β1(medv)+β2(medv)2+β3(medv)3+ϵ

boston.poly.medv = lm(crim ~ medv + I(medv^2) + I(medv^3), data = Boston)
summary(boston.poly.medv)

Call:
lm(formula = crim ~ medv + I(medv^2) + I(medv^3), data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-24.427  -1.976  -0.437   0.439  73.655 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 53.1655381  3.3563105  15.840  < 2e-16 ***
medv        -5.0948305  0.4338321 -11.744  < 2e-16 ***
I(medv^2)    0.1554965  0.0171904   9.046  < 2e-16 ***
I(medv^3)   -0.0014901  0.0002038  -7.312 1.05e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.569 on 502 degrees of freedom
Multiple R-squared:  0.4202,    Adjusted R-squared:  0.4167 
F-statistic: 121.3 on 3 and 502 DF,  p-value: < 2.2e-16

answer :Each of these variables [indus, nox, dis, ptratio, medv] squared and cubed terms are statistically significant (p-value is below 0.05) which means we can reject the null hypothesis. Age seems like have non-linear relationship with crim, but the squared and cubed terms are not statistically significant (p-value is above 0.05) which means we can not reject the null hypothesis. For other variable, there are no evidence to support the non-linear relationship between crim and other variables.

7 Bonus question (20pts)

For multiple linear regression, show that \(R^2\) is equal to the correlation between the response vector \(\mathbf{y} = (y_1, \ldots, y_n)^T\) and the fitted values \(\hat{\mathbf{y}} = (\hat y_1, \ldots, \hat y_n)^T\). That is \[ R^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = [\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2. \] answer : Recall that \[ Cor(x,y) = \frac{\sum_{i}(x_i - \overline{x})(y_i - \overline{y})} {\sqrt{\sum_{i}(x_i - \overline{x})^2 \sum_{i}(y_i - \overline{y})^2}} = \frac{(x - \frac{1}{n}J_n)^T(y - \frac{1}{n}J_n)} {\sqrt{(x - \frac{1}{n}J_n)^T(x - \frac{1}{n}J_n)(y - \frac{1}{n}J_n)^T(y - \frac{1}{n}J_n)}} \] where \[ J = 11^T = \begin{bmatrix} 1 & 1 & \dots & 1 \\ 1 & 1 & \dots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \dots & 1 \end{bmatrix}_{n \times n} \] And for R^2 \[ R^2 = 1 - \frac{RSS}{TSS} = 1 - \frac{\sum_{i}(y_i - \hat{y}_i)^2}{\sum_{i}(y_i - \bar{y})^2} \] And \[ R^2 = 1 - \frac{\sum_{i}(y_i - \hat{y}_i)^2}{\sum_{i}(y_i - \bar{y})^2} = 1 - \frac{(Y - HY)^T(Y - HY)}{(Y - \frac{1}{n}J_nY)^T(Y - \frac{1}{n}J_nY)} = 1 - \frac{Y^T(I - H)^T(I - H)Y}{Y^T(I - \frac{1}{n}J_n)^T(I - \frac{1}{n}J_n)Y} \] Therefore, \[ R^2 = 1 - \frac{Y^T(I - H)Y}{Y^T(I - \frac{1}{n}J)Y} \] where H = X(XTX){-1}X^T is a projection matrix and so as I - H. Now we have \[ Cor(Y, \hat{Y})^2 = \frac{(Y - \frac{1}{n}JY)^T(\hat{Y} - \frac{1}{n}J\hat{Y})} {[(Y - \frac{1}{n}JY)^T(Y - \frac{1}{n}JY)][(\hat{Y} - \frac{1}{n}J\hat{Y})^T(\hat{Y} - \frac{1}{n}J\hat{Y})]} \] where \[ \hat{Y} = X\beta = X(X^TX)^{-1}X^TY = HY \]